We modified values from text to numeric outputs.
We split single-cell multi-entry to multi-cell single entry for following values.
We removed columns as they were irrelevant for analysis.
We tested for completeness.
## Num Title Year Runtime Director Writer
## 1.000 1.000 1.000 1.000 1.000 0.996
## Metascore imdbRating imdbVotes BoxOffice Production First Actor
## 0.708 1.000 1.000 0.300 1.000 1.000
## Second Actor Third Actor Fourth Actor Awards Nominations First Genre
## 1.000 1.000 1.000 1.000 1.000 1.000
## Second Genre Third Genre l1 l2 l3 l4
## 0.908 0.632 1.000 0.496 0.224 0.088
## l5 l6 l7 c1 c2 c3
## 0.032 0.012 0.004 1.000 0.296 0.088
## c4 c5 c6 c7 c8 c9
## 0.024 0.012 0.004 0.004 0.004 0.004
We observed characteristics of singular variables and presented them using appropriate charts.
There were 155 unique directors.
There were too many directors, so we only plotted occurrence of four or more.
## x freq
## 4 Alfred Hitchcock 9
## 9 Billy Wilder 7
## 14 Charles Chaplin 5
## 16 Christopher Nolan 7
## 36 Frank Capra 4
## 86 Martin Scorsese 7
## 108 Quentin Tarantino 4
## 115 Ridley Scott 4
## 134 Stanley Kubrick 8
## 136 Steven Spielberg 7
There were 213 unique writers.
There were too many writers, so we only plotted occurrence of four or more.
## x freq
## 27 Charles Chaplin 5
## 158 Quentin Tarantino 4
## 185 Stanley Kubrick 6
## 187 Stephen King 4
There were 89 unique productions.
There were too many production companies, so we only plotted occurrence of four or more.
## x freq
## 1 20th Century Fox 15
## 9 Buena Vista Pictures 4
## 11 Columbia Pictures 10
## 41 MGM 9
## 45 Miramax Films 7
## 46 New Line Cinema 6
## 55 Paramount Pictures 17
## 62 Sony Pictures 5
## 72 Twentieth Century Fox Home Entertainment 4
## 73 United Artists 15
## 78 Universal Pictures 14
## 81 Walt Disney Pictures 8
## 82 Warner Bros. 10
## 83 Warner Bros. Pictures 27
There were 773 actors and actress.
There were too many actors and actresses, so we only plotted occurrence of four or more.
## x freq
## 10 Al Pacino 4
## 97 Carrie Fisher 4
## 101 Cary Grant 5
## 106 Charles Chaplin 4
## 120 Christian Bale 4
## 281 Harrison Ford 7
There were 28 movies with 75 awards or more
## Title Awards imdbRating
## 1: The Dark Knight 153 9.0
## 2: Schindler's List 78 8.9
## 3: The Lord of the Rings: The Return of the King 208 8.9
## 4: The Lord of the Rings: The Fellowship of the Ring 117 8.8
## 5: Inception 154 8.8
## 6: The Lord of the Rings: The Two Towers 120 8.7
There were 25 movies with 120 nominations or more.
## Title Nominations imdbRating
## 1: The Dark Knight 153 9.0
## 2: The Lord of the Rings: The Return of the King 122 8.9
## 3: The Lord of the Rings: The Fellowship of the Ring 124 8.8
## 4: Inception 203 8.8
## 5: The Lord of the Rings: The Two Towers 138 8.7
## 6: Interstellar 142 8.6
There are 34 movies with 75+ awards or 120+ nominations
## Title Awards Nominations
## 1: The Dark Knight 153 153
## 2: Schindler's List 78 33
## 3: The Lord of the Rings: The Return of the King 208 122
## 4: The Lord of the Rings: The Fellowship of the Ring 117 124
## 5: Inception 154 203
## 6: The Lord of the Rings: The Two Towers 120 138
## imdbRating
## 1: 9.0
## 2: 8.9
## 3: 8.9
## 4: 8.8
## 5: 8.8
## 6: 8.7
There are 19 movies with 75+ awards and 120+ nominations
## Title Awards Nominations
## 1: The Dark Knight 153 153
## 2: The Lord of the Rings: The Return of the King 208 122
## 3: The Lord of the Rings: The Fellowship of the Ring 117 124
## 4: Inception 154 203
## 5: The Lord of the Rings: The Two Towers 120 138
## 6: The Departed 96 134
## imdbRating
## 1: 9.0
## 2: 8.9
## 3: 8.8
## 4: 8.8
## 5: 8.7
## 6: 8.5
There are 9 movies with 75+ awards and 120- nominations
## Title Awards Nominations imdbRating
## 1: Schindler's List 78 33 8.9
## 2: Saving Private Ryan 79 74 8.6
## 3: WALL·E 91 90 8.4
## 4: American Beauty 108 98 8.4
## 5: L.A. Confidential 87 77 8.3
## 6: Up 76 82 8.3
There are 6 movies with 75- awards and 120+ nominations
## Title Awards Nominations imdbRating
## 1: Interstellar 42 142 8.6
## 2: Django Unchained 58 151 8.4
## 3: The Wolf of Wall Street 38 170 8.2
## 4: Gone Girl 64 177 8.1
## 5: The Imitation Game 45 150 8.1
## 6: The Martian 34 187 8.0
There are 24 unique genres.
There are 44 unique languages.
## x freq
## 3 Arabic 8
## 5 Cantonese 4
## 8 English 250
## 10 French 41
## 11 German 32
## 18 Italian 17
## 19 Japanese 6
## 21 Latin 13
## 31 Russian 12
## 35 Spanish 31
## 40 Vietnamese 4
There are 31 unique countries.
## x freq
## 1 Australia 6
## 4 Canada 6
## 7 France 12
## 8 Germany 11
## 12 Ireland 4
## 13 Italy 4
## 27 UK 55
## 29 USA 233
## Year Runtime Metascore imdbRating imdbVotes
## Year 1.00000000 0.17938049 -0.34085225 0.04549597 0.53623097
## Runtime 0.17938049 1.00000000 -0.06702619 0.24776081 0.24974676
## Metascore -0.34085225 -0.06702619 1.00000000 0.17211994 -0.09800265
## imdbRating 0.04549597 0.24776081 0.17211994 1.00000000 0.65668014
## imdbVotes 0.53623097 0.24974676 -0.09800265 0.65668014 1.00000000
## BoxOffice 0.33354876 0.16853031 0.07890235 0.12349854 0.38711525
## Awards 0.47198940 0.19118929 0.30717283 0.19979238 0.44580066
## Nominations 0.61632678 0.18631483 0.11100696 0.08192601 0.47425298
## BoxOffice Awards Nominations
## Year 0.33354876 0.4719894 0.61632678
## Runtime 0.16853031 0.1911893 0.18631483
## Metascore 0.07890235 0.3071728 0.11100696
## imdbRating 0.12349854 0.1997924 0.08192601
## imdbVotes 0.38711525 0.4458007 0.47425298
## BoxOffice 1.00000000 0.1831188 0.20620724
## Awards 0.18311879 1.0000000 0.84812343
## Nominations 0.20620724 0.8481234 1.00000000
We can see the plots of correlation and we can observe that years seem to be decently correlated with other variables. On a different note, imdbRatings is poorly correlated with other variables except for imdbVotes. This is interesting since it shows that imdbRatings is not particularly related to other factors that should affect the ratings (ie metascore, awards, and nominations).
Here we have graphed all combinations of imdbRatings to other elements, and we can see that the data is generally distributed randomly except for imdbVoting numbers. However, this also could be part of the effect of missing data. Metascore (70.8%) and boxoffice(30%) is missing some data and the format of awards and nomination could have missed some values or have incorrect values since some movies are recent and the awards and nominations are not updated.
Melvin’s contribution: - Melvin considered which variables to analyze (imdb and Metascore ratings with Awards and BoxOffice), and how he would investigate those trends between those variables. - Melvin created an interactive scatter plot using ggplot and plotly. His lines of code provide information about each film (represented as points in the scatterplot) through label and shape features of ggplot. - He also took into account the story told by the data and possible conclusions that could be drawn from it. - The first scatter plot investigates the relationship between Award, BoxOffice, and Metascore, and compares averages of Award counts and BoxOffice for each film. - The second scatterplot looks more closely at the top left region of the graph, investigating the genre of films in that area.
Raymond’s contribution: - explored the relationship between imdbRating and Nominations. - The first graph Raymond used ggplot to show the variables of x = nominations and y = imdbRating. - The Raymond explored another ggplot, to show the the variables of x = Years and = Nominations. - Finally Raymond explored the relationship between imdbRating, imdbVotes, and Nominations via a 3D plotly interactive graph. Where x = imdbRating, y = imdbVotes, z = Nominations. - Raymond explained the story that a higher imdbRating do not equate that a movie will have more nominations.
Stephanie’s Contribution: - Stephanie observed productions companies and found the following: –Warner Bros. has the highest ratings and the most awards. –Disney doesn’t get much recognition in this area. –Production companies with lower (and by low we mean 8.0) ratings collectively, have quite a few awards. –Is the population harsher than the Academy? Popolation mean number of votes: 431401.3 voters in the Academy: 9,427 as of Dec 2020
Paul’s Contribution: - Data Munging: All of data cleaning and munging process - EDA: All of single variable analysis - EDA: correlation plot across all numerical values - EDA: Analysis on imdbRatings on other numerical values - EDA: imdbRatings distribution on genre types - Powerpoint: Creating and merging powerpoint into single coherent powerpoint - Report: Modify the powerpoint to be in report form